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ABSTRACT 

In the context of a recapture sampling design for debugging experiments we 
consider the problem of estimating the error or hitting rate of the faults remaining 
in a system. Moment estimators are derived for a family of models in which the 
rate parameters are assumed proportional to the tail probabilities of a discrete 
distribution on the positive integers. The estimators are shown to be asymptotically 
normal and fully efficient. Their fixed sample properties are compared, through 
simulation, with those of the conditional maximum likelihood estimators. 

Key words and phrases: software reliability; asymptotic efficiency; conditional 
likelihood; average information; interval truncated sampling. 
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l. Introduction 


Nayak (1988) recently proposed a recapture sampling design for estimating 
the number of errors or faults remaining in a system. As is common in debugging 
experiments, a system is tested for a time period of length r, the failure (i.e., error 
detection) times Ti,T 2 ,...,T r are observed, and repair takes place immediately 
after a fault produces an error. By using standard error detection techniques (e.g., 
Avizienis and Chen, 1977) the hitting frequency Mi = Mi(T{, r) of the fault detected 
at time T, is observed as the number of times the region (i.e., path in software) 
containing the fault is accessed during the interval (T,, r). Nayak’s (1988) discussion 
concerns the Jelinski-Moranda (1972) model A,- = (u— i+l)<f>, <f> > 0, t = 1,2, . . . >u 
where u, a parameter, is the initial number of faults in a system. 

The purpose of the present paper is to study estimation procedures related to 
the following model. The spacings Yi = T, - T,_i (To = 0), * = 1, 2, . . . are assumed 
independent and exponentially distributed with rate parameters A,- given by 

A,- = a G(i - 1, <f>), a > 0, & = a g(i, <f>) (1) 

That is, A,- is the rate of encountering the remaining faults after * — 1 faults have 
been removed and £,• = A»- — A»+i are the hitting rates of the first, second, etc., 
detected faults. In (1), G{x,<j > ) = 1 — G(x, 4>), is the distribution function of a 
discrete positive random variable and g(x, <f>) is the density function or probability 
mass function of G(x, <f>). The quantity £, can be interpreted as the amount by 
which A,- decreases when repairing the fault detected at time T,-. Counts {M,(t)} 
of repeated error occurrences are assumed to be independent homogeneous Poisson 
processes with rate parameters 

In this context the J-M model is given by a discrete uniform distribution with 
mass at 1,2, This model, however, assumes that faults have a common 
rate & = <f> whereas experimental investigations (e.g., Nagel, Scholz, and Skrivan, 
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1984) indicate that faults may have different hitting rates. The log linear rate 
model A,- = (Cox and Lewis 1966) corresponds to a geometric distribution 

and describes the case £1 > £ 2 > ... in which faults having the highest hitting 
rates are detected early in the debugging process. Other models that seem to 
be related to (1) are those of Sandland and Cormack (1984) and Miller (1986). 
For our purpose it suffices to take g(i, <f>) to be the discrete exponential family of 
densities g(i,<j>) = exp[a,<£ — tp(<f>) + 6,-]. These models yield a sufficient statistic of 
smaller dimension than obtained in general, although for most families the likelihood 
function (Section 2) does not have exponential family structure. 

The main problem we consider is that of estimating the error (or hitting) rate 
A r+ i for a system in its final state; i.e., a system for which R = r faults have been 
removed. Moment estimators of (a, <j>) are presented in Section 3. Their bias and 
asymptotic variances are compared with those of the maximum likelihood estimators 
in Section 4. In Section 3 we show that functions of the form r -1 / 2 ln(A r+1 /A r+1 ) 
have a limiting (r — ► oo) normal distribution under various models. The conditional 
likelihood function given in Section 2 defines the setting of our discussion. 

2. A Conditional Likelihood 

We assume that a system is tested until no errors are detected for a time 
period of length s. Data is obtained through interval truncated sampling by which 
we observe T % , T 2 , . . . , Tr and R = r providing Yi < s, t = 1, 2, . . . , r and F r +i > s - 

With Y u Y 2 ,..., being independent exponential random variables, the condi- 
tional density of Yi , Y 2t . . . , Yr given R — r, is 


/(yi,y 2 ,-..,yr | r) 

r 

= JjA,exp(-A,y 1 )[l-exp(-A t s)] _1 , 0 < y, < s; t = l,2,...,r (2) 

*=i 
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The total test time r = Fi + s is random while in Nayak’s (1988) discussion, r 
is a fixed quantity. 

The full data vector can be represented in terms of the vector quantities Z k 
defined by 

Zi = {Y{), Z k = {Y k ,M lk ,M 2k ,...,M (k - l)k ) (fc = 2,3, ...,r + 1) (3) 

Here < k, is the number of times the system encounters the tth detected 

fault during the interval (T*_i , T*]. The last interval (TV, r] has fixed length s while 
the remaining intervals (T*_i , Tfe], k < r, have random length Y k . For notational 
convenience, we let Y r + * = s. 

Our earlier assumption that are independent homogeneous Poisson 

processes together with Y\,Y 2 ,. . . ,Y r being independent implies that Z\,Z 2 , . . ., 
Z r+ i are conditionally, given R = r, independent with densities 

9k{yk; mik, m 2 fc, . . . , m( k -i) k ) 

k- 1 

= - '-’“■•r 1 n /”»•*! (* = 1.2 ') 

1=1 

= (* = r+l) (4) 

* = 1 

Substituting A,- = aG(i — 1, 4>) and £,• = ag{i,<t>) in (4), the log likelihood is 
l c = Hi +1 l k where 

k—l k—1 

lk = C k (a,<t>)~ ay k + In a m ik + m ik In g(i, <i>)+C (5) 

»=i t=i 

Here Y r + i = s, C does not depend upon (a, <£), and 

C k {a, <)>) = ln[A*(l - k = 1, 2, . . . , r 

— k — t 1 
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Let V£ = (Vik,V 2 k,V 3 k), k = 1 , 2 , ...,r+ 1 (prime denotes vector transpose) be 


defined by 


Vi k = n, 


As— 1 fc-l 

v* k = ^ Af.fc. ^3* = ^2 a ' M 'k 
*=1 »=1 


( 6 ) 


where a, are constants. 

The following moments are needed to obtain the average information matrix 
and also in Section 3, to study the asymptotic distribution of S r = Y2k=i Vk- 


THEOREM 1. Let the means, variances and covariances of the elements ofVk be 
denoted by fiik — ! * = 1,2,3 and Oijk — C ov {V(k, Vjk) t, j — 1,2,3, with 


Ilk = YiiZi a iti and 72 k = £*=/ « 2 &- 
Then 

(Hk = ^k 1 ~ s{e Xk ‘ - l)- 1 

l*3k = Plkllk 
°12 k = ~ A k ) 

&22k = /*lfc(Al — Afc) + <7nfe(Ai - Ajfc) 2 

<*33k ~ Mlfc72fc + Vllkllk 


p2k = ^lfc(Al ~ A k) 

ouk = K 2 ~ sV fca (e Afca - 1)“ 2 
°\Zk = ^llfc7lfc 

023* = Ml*7l* + O'llfcCAl ~ \k)llk 


These moments can be obtained by noting that Y* has an exponential distri- 
bution truncated over the interval (0,s) and that {M^} are conditionally, given 
Y k , independent Poisson random variable with means £,-Yfc. Since V3 k is a linear 
function of Af,*,* < k, these moment are similar to those given in [3]. 

By taking derivatives and expectations, the Fisher information matrix Ak = 

( a ijk), based on Ik, can be obtained as follows: 

d 2 lk 

auk = ~ a “Vi k + a“ 2 (AfcOnfc - A^ifc) 

d 2 lk , , , 

°12 k = ~ E \ Q a Q^ ~ 'hki/^ik ~ Afc^iifc) 

022 * = -E{^p) = OLfi X kl 2 k + a 2 <r uk{lik ) 2 
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k —1 ~ A :— 1 ^ 

»=1 a<p »=1 a<p 

Since Yk converges (as k —*■ oo ) in distribution to a uniform distribution on the 
interval (0, s ) the moments of Yk converge to the corresponding moments of the 
limiting uniform distribution (Serfling, 1980, p.14 ). We thus have lim/ii* = s/2, 
limoiijfe = s 2 / 12, and lim A* = 0 as k — *• oo Assuming that g(i, <f>) is a regular family 
with support not depending upon <f>, the limits 7,- = lim7( fc , * = 1,2 are given by 


OO r \ OO ~ 

Ifl = H ^2 = ^[^ln<7(t',^)] 2 .<7(t,^) 

^ 1=1 ^ 


1=1 


where 72 is the Fisher information about the parameter <}> based on a single ob- 
servation from g(i, <f>). Thus 7i = 0 and the limiting average information matrix 
A = lim(l/r)(Ai + A2 + ... + A r +i) is A = (o,y) where an = s/2, 012 = 0, and 
022 = oc^s/ 2 . 


3. Estimation 

We now consider exponential family rate models given by 

& = aexp[<£a(t) - W) + &(*)]>* = 1,2,.... (7) 

where <j> varies over the natural parameter set {<f> : exp[<£a(t) + 6(t')] < 00}. 

This family includes the Poisson (£,• = aO l ~ l e~ e /(i — l) !, 0 > 0, * = 1,2,....) and 
log linear model as well as other models. 

Let Vk be defined as in (6) where a,- is the coefficient of <f> in (7). In reference 
to l c = 53i +1 Ifc defined by (5), we have the following: 

(i) V U V 2 ,..., V r+ i are independent. 

(ii) S r = J3i +1 Vk is a sufficient statistic for the family defined by l c . 

(iii) Si = (Sir, S 2r , S 3 r) is given by S lr = r, S 2r = Ei M i> s 3r = Ei a * M i 
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THEOREM 2 . Under (7) where a, > 0 is nondecreasing in i = 1 , 2, . . (1 /r)S r has 
a limiting (as r —>■ oo) normal distribution with mean vector \i' = (mi,M2>M3) and 
covariance matrix (1 jr)C given by 

Mi = s /2 M2 = as /2 m3 = astp' 

c u = s 2 /12 C12 = as 2 / 12 C13 = (as 2 / 12)^' 

c 2 2 = as/2 + a 2 s 2 / 12 c 2 3 = c 2 20' c 33 = (as/ 2)V»" + c 2 2(V'') 2 

The proof is given in Section 5 . 

Note that a = 0 i(mi>/* 2,M3) and 0 '(<£) = 02 (mi>M 2 ,M 3 ) where 01(21,22,23) = 
22/21, ^2(21,22,23) = 23/22, Applying the 5 -method, the estimates (a,<£) given by 

s = £ Mi/r, v' (I) = £ °‘ M ‘/ E 

1=1 1=1 1=1 

have a limiting normal distribution with mean vector (a, (f>), and are asymptotically 
independent with variances ag 2 = 2 a/rs , a^ 2 = 2[ras^ ,, (^)] -1 . 

In estimating A r +i by A r +i = d< 5 (r; 4>) we must account for the fact that 
r increases as r 1//2 (a — a) and r 1 / 2 ^ — <£) converge to their limiting distribu- 
tions. For the log linear rate model A t - = ae _ ^(* -1 ) with <j> = — /?,/? > 0 , we have 
r -1 / 2 ln(A r+ i/A r+ i) = r l l 2 (4>~4>) so that by Theorem 2, r -1 / 2 ln(A r +i/A r +i) has a 
limiting (r — ► 00) normal distribution with mean zero and variance 2[ast// , (<£)] -1 = 
2e*(e“* - l) 2 (as)“ 1 . 

To deal with the other models in Table 1, we apply Taylor’s formula to H(<j>) = 
— In G(r ; <f>) and obtain 

r -1 / 2 ln(A r+ i/A r+ i) = r -1 / 2 (lna - lna)-r -1 / 2 (<£ - 4>)H'(<t>*) 

— (l/ 2 )r(<£ — (p) 2 r~ 3 ^ 2 H"(<f>*) (8) 

where | <j>* - <f> \ < \4> — <f>\. The first term of (8) converges in probability to zero. 
Under the Poisson and logarithmic series models, r -1 |iT , (<£*)| converges to 1, and 


7 


r~ 3 / 2 H"(<f>*) converges in probability to zero. Thus for all of the models of Table 
1, the limiting distribution of r -1 / 2 ln(A r+ i/A r+ i)is identical to that of r 1//2 (<^> — <j>) 
and is a normal distribution with mean zero and variance 2[asip" (tf))] -1 . 

4. Efficiency and Bias 

Since 05 ? = 2 oc/(rs) and cr^ 2 = 2[rotsip"(<f>)]~ 1 where 0"(<£) = 72 is defined at 
the end of Section 2, it follows that a and <i> are asymptotically fully efficient. 

To study the fixed sample properties of a. and <f>, we simulated their values for 
the Poisson rate model under the conditional likelihood defined in (6). This was 
done by generating 200 replicates of (Ti, T 2 , ... ,T r , Mi, M 2 , ..., M r ) for the values 
of r shown in Table 2. In addition the conditional maximum likelihood estimates 
a c and <f> c were calculated for each replicate by maximizing l c = Ik where 

is defined in (5) . 


r 


A 

OCc 

X" 

<t>c 

a 

4> 

15 

Bias 

-.012464 

.020852 

-.001241 

-.050126 


MSE 

.000155 

.000435 

.000001 

.002512 

20 

Bias 

-.010878 

.004578 

-.002542 

-.025457 


MSE 

.000018 

.000021 

.000006 

.000648 

25 

Bias 

-.008613 

-.028581 

-.001379 

-.046221 


MSE 

.000074 

.000817 

.000002 

.002136 

30 

Bias 

-.007036 

.012544 

-.001484 

.022037 


MSE 

.000050 

.000157 

.000002 

.000485 


Table 2. Bias and mean square error (MSE) of the conditional maximum likelihood 
estimators a e and 4> c and moment estimators a and 4> based on 200 simulations 
with a = .10 and <j> = —2.00. 

Table 2 shows the bias and mean square error (MSE) for a. and 4> and also the 
bias and MSE for a c and <j) c . Although the conditional MLE <f> c has smaller bias 
than <j>, the moment estimator a seems to generally have smaller bias than a c . 
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5. Proof of Theorem 2 


To prove Theorem 2, note that the elements of n and C are given by m = 
lim(l/r) * = 1»2, 3 and c,y = lim(l/r) Y2k=\ a %jk as r tends to infinity, 

where the terms in these sums are the moments given in Theorem 1. Since fi{ k 
and Oijk converge to finite limits as k tends to infinity, we have /x t - = lim/i,* and 
Cij = limcr t y fc . Thus the calculations are similar to those discussed at the end of 
Section 2. The remainder of the proof requires showing (Serfling, 1980, p.30) that 

r+1 

lim(l/r) ^2h k r=0 (9) 

*=i 

where 

h kr = E[U k I(U k > e 2 r)] 

3 

Uk = £(V* - ( l0 ) 

1=1 

and /(.) is the indicator function. Since h kr < (e 2 r)~ l E(U k ), the limit in (9) can 
be established by examining the fourth moments of the V ik ,i = 1,2,3. 

To obtain bounds for these moments, we replace Z k by 

Z k = (Tfcj N k , Xi k , X 2 ki • • • 1 Xff k ) (k = 1, 2, . . . , r + 1) 

where N k = ^ t=1 M{ k and Xj k takes the value Xj k = i if the jth event occurring 
in the interval ( T k -i,T k ) corresponds to the occurrence of the tth detected fault, 
i = 1, 2, . . . , k — 1. Given that N k = n, X \ k , X 2 k , . . . , X nk are i.i.d with truncated 
density g[i ;^) /G(k — 1 \ = 1,2, . . . , k - 1. 

In terms of Z k , the vectors V k defined in (6) can be written 

Nk 

Vi * = n. V lt = Nt V 3k = J 2 <Xjk) (11) 

i=l 
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where a(») = a,-. 

Since the distribution of N k is conditionally, given Y k , Poisson with mean 
( A i — A k )Y k and since Y k < s, it follows that Nk is stochastically smaller than the 
Poisson random variable N that has mean Ais = as. Thus for any positive integer 
p we have E{Y£) < s p and E{N k p ) < E(N P ) < oo. 

For any nonnegative quantities . . • , w n and positive integer p we have 

(]r)” =1 tvi) p < n p max(u> p ) < n p 5Zr=i W * P an< ^ thus 

( 12 ) 

»=1 i=l 

By applymg (12) to the form of V 3 k given in (11), we obtain 

Nk 

V 3k p <N k P Y t [a(X^)] p 

j = i 

^(V 3 fc p ) < E(N k p+1 )E{[a(X lk )} p } 

< ^?(iV p+1 )F;{[a(X)] p } 

where N has a Poisson distribution with mean as and X has the density g(i; <t>). 
All positive moments of a(x) exist for the family of densities in (7). In summary 
we have E(Vf k ) < Bi p where J5, p ,t = 1,2,3 do not depend on k. 

To complete the proof of Theorem 2, we again use (12) to obtain U k 2 < 9 
(^ Vi k ~ Mi ft) 4 - Since (V,* — m k ) 4 < (V,* + Mi k ) 4 , the binomial expansion can be used 
to show that E(U k ) < B where B is finite and does not depend on k. Thus the 
limit in (9) is zero, which proves Theorem 2. 

6. Final Remarks 

Software testing counters (Huang, 1977) will tend to over count the number of 
times a fault produces an error in the output. An alternative method, which will 
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accurately give the fault hitting frequencies, has been described in the literature on 
multiversion progr amming (Avizienis and chen, 1977). To describe this method in 
the context of error recapture experiments, let P\,P 2 , • . • denote successive versions 
of the original program Po , where P t - is the result of correcting the fault detected in 
Pi-i at time T{. A copy of P,_i is made before correcting this fault and all of the 
versions (P 0 , Pi, • • . , P,) are run on the same input series during the interval (T,, t). 
To determine the hitting frequency of the fault detected at time T,-, the outputs of 
P,_i are compared with those of P,-. Any difference in the outputs is due to the 
fault that resides in P,_i which has been corrected in P t -. Similarly, comparing the 
outputs of all pairs (P»_i, P,), i = 1, 2, . . . , r will yield the total set of fault hitting 
frequencies. 
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Model 


Estimator 


Variance 


Poisson, 

6 = <*exp[<£(t - 1) - tp(<j>)]/(i - 1)!, 
a > 0, — oo < <f> < oo, V’(^) = 

A r+ i = a f u r ~ 1 e~' t du/(r — 1)! 

Jo 


1=1 

1=1 1=1 


2a(rs) _1 

2[ars0' / ] _1 


Geometric, 

6' = aexp[<£(i - 1) - 
a > 0, <j> < 0, = - ln(l - e*) 

A r+ i = ae^ r 


r 

ol= 

t=i 

r r 

*=-ln(l + £Mj/ £(< - 1)M.) 
1=1 1=1 


2a(rs) _1 

2[arst/>"]~ 1 


Logarithmic Series, 

ti = ccexp[<f>i - tl>(<j>))/i, 

a > 0, 4> < 0 

rp(<f>) = - ln[— ln(l - e*)] 
re* 

A r+ i = ae~^^ I u r ~ 1 e~ u du 

Jo 


* = Y,Milr 
1 = 1 

(1 — e~ '*) ln(l — e~*) 
= ±M,/±,M. 

t=l t = l 


2a(rs)~ 1 
2 [arst//'] -1 


Table 1. Moment estimators of ( a,<j > ) for the Poisson, Geometric and Logarithmic series rate 
models. 
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